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O ■ This paper first focuses on deriving an alternative approach for proving an extremal entropy inequahty 
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(EEI), originally presented in ifTTl . The proposed approach does not rely on the channel enhancement 

o: 

^sj , technique, and has the advantage that it yields an explicit description of the optimal solution as opposed 

to the implicit approach of ifTTI . Compared with the proofs in ifTTl . the proposed alternative proof is also 
simpler, more direct, more information-theoretic, and has the additional advantage that it offers a new 
^ ■ perspective for establishing novel as well as known challenging results such the capacity of the vector 

Gaussian broadcast channel, the lower bound of the achievable rate for distributed source coding with 
a single quadratic distortion constraint, and the secrecy capacity of the Gaussian wire-tap channel. The 
second part of this paper is devoted to some novel applications of the proposed mathematical results. 
■ The proposed mathematical techniques are further exploited to obtain a more simplified proof of the 

EEI without using the entropy power inequality (EPI), to build the optimal solution for a special class 
. of broadcasting channels with private messages and to obtain a mutual information-based performance 

bound for the mean square-error of a linear Bayesian estimator of a Gaussian source embedded in an 
additive noise channel. 
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I. Introduction 

The classical entropy power inequality (EPI) was first established by Shannon HI. Due to its importance 
and usefulness, EPI was proved by several different authors using distinct methods. In 0, Stam provided 
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the first rigorous proof, and Stam's proof was further simpUfied by Blachman 131 and Dembo et al. lH, 
respectively. Verdii and Guo proposed a new proof of the EPI based on the I-MMSE concept HI. Most 
recently, Rioul proved the EPI based only on information theoretic quantities Q. Before Rioul's proof, 
most of the reported proofs were based on de Bruijn-type identities and Fisher information inequality, 
i.e., the previous proofs were conducted mainly via an estimation-theoretic approach rather than an 
information-theoretic approach. 

Due to the significance of the EPI, numerous versions of EPIs such as Costa's EPI fj), the EPI for 
dependent random variables O, and the extremal entropy inequality (EEI) [11] have been proposed. 
Among the EPIs, the extremal entropy inequality is especially prominent since it can be adapted to 
several important applications investigated recently in the wireless communications area. In 111], Liu and 
Viswanath proposed the extremal entropy inequality, motivated by multi-terminal information theoretic 
problems such as the vector Gaussian broadcast channel and the distributed source coding with a single 
quadratic distortion constraint, and suggested several applications for the extremal entropy inequality. The 
EEI is an entropy power inequality which includes a covariance constraint. Because of the covariance 
constraint, the EEI could not be proved directly by using the classical EPI. Therefore, a powerful 
technique, referred to as the channel enhancement technique ||T21 . was adopted in the proofs reported in 

im. 

The proofs proposed in flTl proceed as follows. First, the extremal entropy inequality is cast as an 
optimization problem. Using the channel enhancement technique, which relies mainly on Karush-Kuhn- 
Tucker (KKT) conditions, an alternative optimization problem, whose maximum value is larger than the 
maximum value of the original problem, is proposed, and the alternative problem is solved using the 
EPI. Finally, the proof is completed by showing that the maximum value of the alternative problem is 
equal to the maximum value of the original problem. Even though Liu and Viswanath proposed two 
kinds of proofs, a direct proof and a perturbation proof, both proofs are commonly based on the channel 
enhancement technique, and they are derived in a similar way except de Bruijn's identity is adapted in 
the perturbation proof. 

The main theme of this paper is to develop a novel mathematical framework to prove the extremal 
entropy inequality without using the channel enhancement technique. Since the channel enhancement 
technique is adapted to prove not only the extremal entropy inequality but also the capacity of several 
different kinds of Gaussian channels, e.g., the capacity of the Gaussian broadcast channel and the secrecy 
capacity of the Gaussian wire-tap channel, by finding an alternative proof for the extremal entropy 
inequality, one can also find novel techniques to calculate the capacity of Gaussian broadcast channel. 
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the secrecy capacity of Gaussian wire-tap channel, etc. More important is the fact that the mathematical 
framework and tools developed in the first part of this paper to achieve an alternative proof of the extremal 
entropy inequality without using the channel enhancement technique are exploited in the second part of 
this paper to achieve a second proof of EEI (a more simplified proof of the EEI that does not use 
neither the EPI nor the worst additive noise lemma), to obtain the optimal solution of a special class 
of broadcasting problems that assume a private message, and to characterize the minimum mean-square 
error (MMSE) performance of linear Bayesian estimators of a Gaussian source in additive noise channels. 

The first proof of the EEI, proposed in the first part of this paper, exploits mainly four techniques: 
the data processing inequality, the moment generating function, the worst additive noise lemma, and the 
classical EPI. By using the data processing inequality, the worst additive noise lemma, and the classical 
EPI, an upper bound is calculated. Then, by applying the equality condition of the data processing 
inequality, we prove that the upper bound can be achieved. The moment generating functions are 
implemented to prove the achievement of the equality condition in the data processing inequality. The 
second proposed proof of the EEI relies partly on the techniques and tools proposed in the first proof 
of the EEI, and it is further simplified in the sense that it does not rely neither on the EPI nor on the 
additive worst noise lemma. 

The contributions of our proof can be summarized as follows. In the first part of this paper, a first 
alternative proof of the EEI is proposed, and it is shown to be simpler and more direct compared with the 
proofs in ifTTIl . The proposed proof yields a more information-theoretic approach without using the KKT 
conditions. The proposed approach relies on the data processing inequality, and the moment generating 
function helps to circumvent the step of using the KKT conditions. Moreover, by simply analyzing some 
properties of positive semi-definite matrices, one can bypass the step of proving the existence of the 
optimal solution which satisfies the KKT conditions, a step which is very complicated to accomplish. 
In addition, the structure of the covariance matrix of the optimal solution is mentioned in detail by 
using properties of positive semi-definite matrices. Therefore, the proposed approach yields an explicit 
description of the optimal solution as opposed to the implicit solutions in ifTTl . Furthermore, the proposed 
proof presents a novel investigation method not only for the extremal inequality but also for applications 
such as the capacity of Gaussian broadcast channel, the secrecy capacity of Gaussian wire-tap channel, 
and so on. In the second part of this paper, the tools and mathematical approach used in the first part 
of the paper to prove the EEI are further simplified to obtain a second alternative proof of the EEI 
without using the EPI or the worst additive noise lemma. Two additional applications of the proposed 
results in finding the optimal signaling scheme for a broadcasting problem with a private message and 
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characterizing the MMSE performance of Unear Bayesian estimation schemes for Gaussian sources in 
additive noise channels are described as well. These applications support the usefulness of the developed 
mathematical results and the versatility of the extremal entropy inequality. 

The rest of this paper is organized as follows. The extremal entropy inequality without a covariance 
constraint and its alternative proof are presented in Section |lll The extremal entropy inequality and its 
first alternative proof, which are the main results of this paper, are described in Section |llll In Section 
ITVl several novel applications of the EEI are introduced, including a second much simplified alternative 
proof of the EEI, to illustrate the usefulness and relevance of the developed mathematical framework and 
results. Finally, Section |V] concludes this paper. 

A. Notations 

Throughout this paper, random vectors are denoted by capital letters such as X and Y, matrices are 
represented by bold capital letters such as S and R, and n and n-by-n denote the dimension (size) 
of a random vector and a matrix, respectively. All information theoretic quantities are represented by 
conventional notations. For example, h{X) and I{X; Y) stand for differential entropy of a random vector 
X and mutual information between random vector X and random vector Y, respectively. Conditional 
entropy and conditional mutual information are denoted as h{X\Y) and I{X;Y\Z), respectively. The 
notation ^ or ^ stands for positive (semi)definite partial ordering between matrices, i.e.. Si ^ S2 means 
S2 — Si is a positive semidefinite matrix ll26l . In this paper, a positive definite matrix means a strictly 
positive definite matrix, and Vs stands for the Jacobian matrix with respect to S. The matrix I denotes 
an n-by-n identity matrix, and the matrix stands for an n-by-n zero matrix. Notation E[-] denotes an 
expectation with respect to all random vectors inside [•], and Mx{S) and Mx\y{S) stand for the moment 
generating functions of random vector X and random vector X given Y, respectively. For simplicity, log 
denotes the natural logarithm. 

II. Entropy Power Inequality 

Since the extremal entropy inequality is similar to the classical entropy power inequality, we first 
investigate a relationship between the EEI and the EPI. Without a covariance constraint, the EEI is 
equivalent to the EPI as shown in Theorem [T] 

Theorem 1: For an arbitrary random vector X with a covariance matrix Sx and a Gaussian random 
vector Wg with a covariance matrix Sty, there exists a Gaussian random vector Xq which satisfies the 
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following inequality: 

h{x)- ^,h{x + WG) < h{XG)-^h{XG + WG), (1) 

where the constant /i > 1, all random vectors are independent of each other, is a positive definite 
matrix, and Xg is a Gaussian random vector which satisfies the following: 

1) The covariance matrix of Xg is represented by Xl^, and it is proportional to Ti^/- 

2) The differential entropy of Xg, h{XG), is equal to the differential entropy of X, h{X). 
In addition, the inequality ([T]) is equivalent to the EPI. 

Proof: 

Lemma 1 (Entropy Power Inequality Ml/. /I25l/).- For independent random vectors Xi and X2, 

KX1+X2) > h{XG,+XG,), (2) 

where Xg^ and Xg2 independent Gaussian random vectors, h{XGi) = h{Xi) and h{XG2) = h{X2), 
and the covariance matrices of Xgi ^^d Xg2 proportional. 
Using Lemma [B the following relations are obtained: 

h{X) = h{XG), 

HX + Wg) > KXg + Wg), (3) 

where is proportional to = aHw, and a is an appropriate constant which satisfies 

h{X) = h{XG)- Therefore, the inequality ([T]) is derived from Lemma 1, the EPL and the proof of the 
inequality ([T]) is completed. 

If the inequality (dJ holds, h{X + Wg) > h{XG + Wg) since h{X) = h{XG), and 5]^ is proportional 
to SvK- This is exactly the same as the EPI in Lemma [1] Therefore, the inequality ([T|l is equivalent to 
the EPI. ■ 

While Theorem [U shows a local upper bound, i.e., the upper bound is dependent on a random vector 
X, since a depends on the random vector X, we can also find a global upper bound as shown in Theorem 
|2]and the reference lITll . 

Theorem 2: For an arbitrary random vector X with a covariance matrix Sx and a Gaussian random 
vector Wg with a covariance matrix Svk, there exists a Gaussian random vector X%, which satisfies the 



following inequalities: 

h{x)- ^,h{x + WG) < h{x*G)- ^lh{x*G + WG), (4) 

h{XG)- ^ih{XG + WG) < h{X*a)-fih{X*a + WG), (5) 
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where the constant /i > 1, all random vectors are independent of each other, is a positive definite 
matrix, Xq stands for the Gaussian random vector defined in Theorem [H and Xq is a Gaussian random 
vector whose covariance matrix Xlx* is represented by {fi — 

Proof: The proof, here, is a little different from the proof in ifTTI . In our proof, we deal with both 
a local upper bound and a global upper bound while a global upper bound is directly calculated in 111]. 
Define the function /(a) as follows: 

f{a) = h{XG)-f^HXG + WG) 

n ,1 fin 1 

= — log 27re |qSvk|" — ^ log 2vre |a;SvK + ^vyr ) (6) 

where n denotes the dimension of a random vector, and | • | stands for the determinant of a matrix. 
Since f{a) is unimodal, and 

n jin 



a=(M-l)-i 



a=(M-l)-i 



2(/x-l)-i 2((/x-l)-i + l) 
0, 



n ^ fin 



2(/.-l)-2 2((/.-l)-i + l)2 
< 0, (7) 



/(a) is maximized when a = {fi — 1) ^. 

Therefore, from Theorem [H the following inequality is derived as 

h{X)-fih{X + WG) < h{XG)-fih{XG + WG) 

= /(«) 
< /((/^-ir') 

= h{X*a)-fih{X*c + WG). (8) 

The inequalities ([D include inequalities (jH) and ([5]), and the validity of inequalities (|4]) and ^ is 
proved. The upper bound in ([8]l is a global maximum while the upper bound derived in Theorem [1] is a 
local maximum. 

Remark 1: When fj. = 1, the inequalities Q and ([Sjl are also satisfied. However, we cannot specify 
the covariance matrix of X^ since h{XQ) — fih{XQ + Wg) is increasing with respect to Sx* and it can 
be infinitely large as Sx* is increased. Therefore, we omit the case when /x = 1 in Theorem |2l 
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As shown in Theorems [T] and |2j for fi > I, h{X) — iih{X + Wg) is maximized when random vector 
X is Gaussian. However, when a covariance constraint is added in the inequaUties ([U, dJ]) and (|5]l, we 
cannot prove whether a Gaussian random vector still maximizes h{X) — fih{X + Wq) or not, based on 
the same methods as described in the proofs of Theorems [T] and |2j since the covariance constraint may 
alter the proportionality relationship between the covariance matrices Sx* and 'Sw- 
ill. The Extremal Entropy Inequality 

In ifTTTl . Liu and Viswanath proved that a Gaussian random vector still maximizes h{X) — iJ.h{X + Wg) 
even when a covariance constraint is considered. The inequality dUl was formulated as an optimization 
problem with a covariance constraint as follows: 

niax h{X + Wg) - f^h{X + Vg), 

s.t. ^ R, (9) 

where Wg and Vg are independent Gaussian random vectors with positive definite covariance matrices 
SvK and "Sy, respectively, all random vectors are independent of each other, and the maximization is 
done over the distribution of random vector X. Two proofs, a direct proof and a perturbation proof, 
are provided in lITTI . Each proof approaches the problem in a different way but both proofs share an 
important common approach, namely the channel enhancement technique based on the KKT conditions, 
proposed originally in |[T2l . 

Unlike the original proofs in lITTI . we will prove Theorems [3] and |4] without using the channel 
enhancement technique. Before we deal with the problem we first consider a simpler case of it 
next. 

Theorem 3: For an arbitrary random vector X with a covariance matrix Sx, a Gaussian random vector 
Wg with a covariance matrix and a positive semi-definite matrix R, there exists a Gaussian random 
vector Xq with a covariance matrix Sx* which satisfies the following inequality: 

h{X)-f,h{X + WG) < h{X*a)-fih{X-^ + WG), (10) 

where the constant /i > 1, all random vectors are independent of each other, "Sw is a. positive definite 
matrix, Sx ^ R-, ^x- ^ R- 

Proof: When R is a positive definite but singular matrix, i.e., |R| = 0, the inequality (fTOl) and its 
covariance constraints are equivalently changed into 

h{X)-f,h{X + WG) < h{X*a)-fih{X*a + WG), (11) 
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where X is such that ^ R, "^x* ^ R, and R is a positive definite matrix, as mentioned in liTTl . 
When /X = 1, the inequaUty (fTOl ) is easily proved by the Lemma |2l which will be presented later. 

Therefore, without loss of generality, we assume that > I and R is a positive definite matrix. Then, 
the right-hand side (RHS) of the equation (ITOl) is upper-bounded by means of the following lemma. 

Lemma 2 (Worst Additive Noise IHTll . /IT?]/).- For random vectors X, Xq, Wq, and Wq, 

I{X + Wg + W^;W;j) > IiXG + WG + W^;Wlj), (12) 

where X is an arbitrary random vector, Xq is a Gaussian random vector with the covariance matrix 
identical to that of X, Wq and W'q are Gaussian random vectors, and all random vectors are independent. 
Based on Lemma |2l the following inequalities hold: 

h{X + Wg + W'a) - h{X + Wg + Wa\Wa) > HXg + Wg + Wq) - h{XG + Wg + 

^ h{X + Wg + W^g) - + Wg) > HXg + Wg + W^) - h{XG + Wg) (13) 

^ h{X + WG + W'G)>h{X + WG) + h{XG + WG + W'G)-h{XG + WG), (14) 

where denotes equivalence. Notice that the Gaussian random vector Wg can be expressed as the 
sum of two independent Gaussian random vectors Wg and W'q whose covariance matrices satisfy: 

T^w = ^w + '^w, (15) 

where 5^vF> ^VK' '^w are the covariance matrices of Wg, Wg, and W'q, respectively. Henceforth, 
the Gaussian random vector Wg is represented as Wg = Wg + Wq. 

Based on (fT4l) and ([TS] ). the left-hand side (LHS) of the equation (ITOl ) is upper-bounded as follows: 

h{X) - nh{X + Wg) = h{X) - fxhiX + Wg + Wq) (16) 

< h{X)-fl(^h{X + WG)+h{XG + WG + W^)-h{XG + WG)) (IV) 

= h{X) - fih{X + WG) + fi [HXg + Wg) - HXg + Wg + W^)^ (18) 
Using Theorem |2j if — l)~^Spj^ ^ R, the RHS of equation ([TSl l is upper-bounded as follows: 

HX) - fxHX + WG)+f^ {HXg + Wg) - HXg + Wg + W^)) (19) 
< HX*g) - i^HX*G + WG)+fi [HXg + Wg) - HXg + Wg + VF^)) , (20) 

where Xq is a Gaussian random vector whose covariance matrix Sx* is defined as (^ — l)^^Sj^. Unlike 
Theorem |2j we additionally have to prove that there exists a random vector Xq whose covariance matrix 
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Sjjc* satisfies 

= (p-iy'-s^ (21) 

r< R, (22) 

due to the covariance constraint. Since "Ex ^ we will prove there exists a random vector Xq whose 
covariance matrix Xlx* satisfies 

Ex. = (23) 
r< Sx, (24) 

instead of proving (l22l ). 

Equation (l20l) is further processed by making use of the following lemma. 

Lemma 3 (Data Processing Inequality l{25\l): When three random vectors Yi, Y2, and Is represent a 
Markov chain Yi — )• I2 — Yj, the following inequality is satisfied: 

I{Yi;Ys)<I{Yi;Y2). (25) 

The equality holds if and only if random vectors Yi, Y2, and Y^ form the Markov chain: Yi — > Ys — > 12- 
If the inequality (l24l ) is satisfied, then we can form the Markov chain: 

X'g ^ X'g + + Wg ^ X'g + + Wg + W'g, (26) 

where all random vectors are independent. Since a Gaussian random vector Xg can be expressed as the 
summation of two independent Gaussian random vectors X'g and Xq whose covariance matrices satisfy 



Sx'+Sx., (27) 



where Xlx, and Sx* stand for covariance matrices of Xg, X'g, and Xq, respectively, the Gaussian 
random vector Xg will be represented as Xg = X'g + Xq- 
Based on Lemma [3l we obtain 

I{X'g; X'g + X*g + Wg + W^j) < I{X'g; X'g + X*g + Wg) (28) 

^ h{X'G + X*g + Wg + W[;) - h{X*G + Wg + WIj) < h{X'G + X*G + Wg) - h{X*G + Wg)(29) 

^ h{XG + Wg + W^) - h{X*G + Wg + W^) < h{XG + Wg) - h{X*G + Wg) (30) 

^ h{X*G + Wg) - KXg + Wg) + KXg + Wg + W'g) < h{X*G + Wg + W'g). (31) 

The equivalence in (l30l) is due to Xg = X'g + Xq- 
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Even though we need an upper bound of the RHS term in equation (l20l) . the equation (|3T]) generates 
a lower bound for the equation (l20l ) as follows: 

h{X*a) - f^KX*a + WG) + fi [HXg + Wg) - HXg + Wg + VF^)) (32) 

> h{X*a)-i^h{X*a + WG + W^) (33) 

> h{X*a)-fih{X*a + WG). (34) 
However, if we can construct the following Markov chain: 

X'G^X'a + X*a + WG + W^^X'a + X*a + WG, (35) 
and using Lemma [3] again, it turns out that 

I{X'g ■,X'a + X*a + WG + W^) > I{X'a ^X'a + X*a + Wg), (36) 
and this inequality leads us to a tight upper bound. Indeed, 

I{X'a ■,X'a + X*a + WG + W'G)> I{X'a ; + X£ + Wg) (37) 

^ h{X'c + X}. + Wg + W'g) - h{X*c + Wg + W'a) > h{X'^ + X^ + Wg) - h(X^ + #g)(38) 

^ h{XG + Wg + W^) - h{X*a + Wg + W^) > hiXG + Wg) - h{X*a + Wg) (39) 

^ h{X*a + Wg) - HXg + Wg) + HXg + Wg + W^^) > H^h + Wg + W'g). (40) 

The equivalence in ([391) is due to Xg = X'q + Xq. 

Now using (l40l ). the equations ( fT9l ) and (l20l ) are upper-bounded as follows: 



- i^HX + WG) + ^l [HXg + Wg) - HXg + Wg + W^)j (41) 

< HX*g) - I^HX*G + WG) + fi [HXg + Wg) - HXg + Wg + W^)^ (42) 

< HX*g)-I^HX*g + Wg + W^) (43) 
= HX*G)-f^HX*G + WG), (44) 

and this is exactly the same as the equation (l34l ). Therefore, the following equality is satisfied: 

HX^) - fiHX*G + WG) + fi [HXg + Wg) - HXg + Wg + P^g)) 

= HX*g)-I^HX*g + Wg + W^), (45) 



due to (1341 ) and (|44l) . Now, we will prove that we can actually construct the Markov chain (1351 ) using the 
following lemmas. 
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Lemma 4: For independent random vectors Y\ and Y^, the following equality between moment gen- 
erating functions (MGFs) is satisfied: 

My,+y,(5) = MyXS)My^{S), (46) 

where My(5) = £[6^^"^], E[-] is an expectation, and superscript T denotes the transpose of a vector. 
For jointly Gaussian random vectors Y\ and Y2, this equality is a necessary and sufficient condition for 
the independence between Y\ and Y^. 

Lemma 5: For independent random vectors Yi and I2 given a random vector 13, the following equality 
is satisfied: 

My+y,|y3(5) = My^^y^S) My^iy^S) ■ (47) 

Lemma 6: For a Gaussian random vector X with a mean Ux and a co variance matrix Hx, the MGF 
is expressed as 

MxiS) = expjs^C/x + ^^^Sx^j. (48) 

In the Markov chain (1351 ). since all random vectors are Gaussian (without loss of generality, they are 
assumed to have zero means), using Lemma [6l the following moment generating functions are presented 
in closed-form expression: 

My, |y3 (S) = exp I 5^5]y, ^y^Y^ + ^S^ (Sy - Sy ^y'^Y, ) > 

My,|y,(5) = exp|5^Sy,S^^iy3 + ^5'"(5]y-Sy,I]y3^5]y,)s|, (49) 

where Yi = X^, I2 = X'^^ + + Wq, Y3 = X'^ + Xq + Wq + W^, and their covariance matrices are 
represented by Sly, Xly^, and ^y^, respectively. Since + is a positive definite matrix, there 
exists the inverse of Sy,. 

On the other hand, the MGF of Yi + 12 given Y^ is represented as 

My,+y,|y,(5) 

= exp (Ey + EyJ ^ylY^ + (Sy, - Sy.Sy^^Sy + 5]y, - Sy^S^^^Sy) 

X exp [S^ (Sy - SyS^^^Sy + Sy - Sy S^^^SyJ 5} 
= My,\y,{S)My,\y,{S) exp {5^ (Sy, - Sy.Sy^^Sy, + I^y - Sy Sy^^SyJ 5} . (50) 

' V ' 

{A) 
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If the term (A) in (1501 ) vanishes, Yi and Y2 are independent given I3, and the Markov chain (1351 ) is 
obtained. Using Lemma 11, (1) in [12], we define the covariance matrix as 

= ((Sx + Sh^)-' + l)"'-Sx, (51) 



where L ^ 0, and denotes an n-hy-n zero matrix. The positive semi-definite matrix L must be chosen 
to satisfy 

Ex. ^ ^x, (52) 
LSx' = 5]x'L = 0, (53) 

where Sx* = — "Sx' = — "^x*, L ^ 0. Lemma |7] will prove that such a positive 

semi-definite matrix L exists. 

Lemma 7: There exists a positive semi-definite matrix L which satisfies 

Sx-^5]x, LSx' = I^X'L = 0, (54) 

where = ((Ex + ^wT^ + l) ^ - Sx, Sx* = - ^)~^^w^ = ^x- ^x-, and Ex and 
SvK Stand for a positive semi-definite matrix and a positive definite matrix, respectively. 
Proof: See Appendix lAl 

Remark 2: By directly using Lemma |7] in ( |45l ). one can prove Theorem [3] However, we prefer to 
include explicitly in the proof the step which exploits the equality condition in the data processing 
inequality and the moment generating function. This is due to the following reasons. First, the included 
step shows how to come up with Lemma |7J and helps to understand the intuition behind the proposed 
proof. Second, the proposed step guarantees the fact that the optimal solutions must force the factor (^4) 
in dSOl ) to be zero. In other words, the proposed step provides the necessary condition for the optimality. 



The equation (pTI) can be re-written as 

5]x + SvK= + (55) 

^ {^x + ^wV^ = {^x + ^wr^ + (56) 
Since LSx' = 5]x'L = 0, by multiplying Xlx' to both sides of the equation (l56l ). 

{^X + ^wY^^X' = (5]x + Svy)~^Sx' 

= + (57) 
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and 

= Jlx'i^x + ^wY^ ■ (58) 

Since random vectors Yi, Y2, and are defined as Yi = X'q, Y2 = X'q + Xq + Wq, and Y^ = 
X'q + Xq + Wg + respectively, and they are independent of each other, their covariance matrices 
are represented as 



= + ^^r■, 

Sy-g = Sx' + Sx* + + Sty 

= Sx' + + 

= 5:x + 5:i^. (59) 
From the equations (l57l ) and ( [59l ). it follows that 



Sy^ - Sy^Sy Sy 



= — (^X' + ^X* + (Ex' + Sx* + '^w) ^ 5]x' 

= (Sx' + Sx* + ^vi^) ((Sx' + Sx- + Sx' - {^x- + 5]x- + ^w)~^ ^x-^ 

= 0, (60) 

and from the equations (1581 ) and (l59l ). one can infer that 



Ey - Sy Sy^ 



Sx' — Sx' (Sx' + Sx- + '^w) ^ (Sx' + Sx* + S^y) 

(Sx' (Sx' + Sx. + S^)"^ - Sx' (Sx' + Sx* + Sh/)"^) (Sx' + Sx* + S^) 

0. (61) 



The more general problem, originally proved in lITTI . is now considered in Theorem ID 
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Theorem 4: For an arbitrary random vector X with a covariance matrix Xlx, two independent random 
vectors Wg and Vg with covariance matrices Siy and ^v, respectively, and a positive semi-definite 
matrix R, there exists a Gaussian random vector Xq with a covariance matrix Sx* which satisfies the 
following inequality: 

h{X + Wg) - i^h{X + Vg) < h{X*a + Wg) - ^h{X*a + Vg), (62) 

where the constant /i > 1, all random vectors are independent of each other, SvK is a positive definite 
matrix, Sx ^ R, Sx- ^ R 

Proof: Due to the same reason mentioned in the proof of Theorem [3l without loss of generality, we 
assume /i > 1 and R is a positive definite matrix. The proof is generally similar to the proof of Theorem 
O Using Lemma [3l the inequality (l62l ) can be expressed as 

h{X + WG)- i^KX + Vg) < HX + Wg)- fMhiX + VG) + h{WG)-h{WG) (63) 

< h{X*G + WG)-fih{X*G + VG) + h{WG)-hiWG) (64) 

= h{X*G + WG)-fih{X*G + VG), (65) 
where Wg is chosen to be a Gaussian random vector whose covariance matrix, 5]^^, satisfies 

^ Siy, (66) 

^ (67) 

where is the covariance matrix of the Gaussian random vector Vg, Vq is a Gaussian random vector 
with covariance matrix Sy, Vg = Wg + Vg + Vq, and Wg, Vg, and Vq are independent of one another. 

The inequality in (|63] ) is due to Lemma [3j the inequality ([64]) is due to Theorem [3j and the equality 
(|65] ) will be proved using the equality condition in Lemma [3] We will also prove that there exists a 
Gaussian random vector Wg which satisfies the equations ( [66l ) and ( [67] ) by proving later Lemma [8] 

To satisfy the equality in the equation ( [65] ). the equality condition in Lemma [3] must be satisfied, and 
the following two Markov chains are formed: 

1) 

X^^X^ + Wg^X^ + Wg + W'g, (68) 

2) 

X*G^ X*g + Wg + Wg^ X*G + Wg, (69) 
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where all random vectors are normally distributed, Wg and Wq are independent of each other, Wg = 
Wg + Wq, and Xq is independent of other random vectors. 

The Markov chain (l68l) is naturally formed since Xq, Wg, and Wq are independent Gaussian random 
vectors. The validity of the Markov chain ( [69l ) is proved using the concept of moment generating function. 
In the Markov chain ( [69l ). since all random vectors are Gaussian (without loss of generality, they are 
assumed to have zero means), using Lemma |6l the following moment generating functions are expressed 
in closed-form: 

My.iY^S) = exp S^S^SY.^y'y^ + ^S^ {Sy - ^y-^y'^^y) s| , 



(70) 



where Yi = Xq, Y2 = Xq + Wg, = Xq + Wg + Wq, and their covariance matrices are represented by 
Sy,, Sy^, and Xlyg, respectively. Since "Sw is a positive definite matrix, there always exists the inverse 

of I]y3. 

On the other hand, the MGF of Yi + Y2 given I3 is represented as 

My,+y,|y3(5) 

= exp (Sy, + SyJ S-iy3 + (Sy, - ^y.^y'^y, + ^Y, " I^y.S^^^SyJ ^1 

X exp {S^ (Sy, - ^Y.^Ys^Y + ^Y, - ^y'^y^^y.) S} 
= My,\yXS)My,\yAS) exp {^^ (Sy, - ^y.^y'^y, + ^y, - I^y.Sy^^Sy,) S} . (71) 

^ V ^ ' 

If the factor (B) in (TtTI ) vanishes, Yi and I2 are independent given YJj, and the Markov chain ( [69l ) is 
obtained. Using Lemma 11, (1) in [12], we define a covariance matrix I]^^ as follows: 

= (S^^i + K)"\ (72) 

where K ^ 0, KSx* = Sx-K = 0, and denotes an n-by-n zero matrix. Then, there exists a positive 
semi-definite matrix K which satisfies 

^ (73) 
KSx. = Sx*K = 0, (74) 

(75) 

where Sx* = {fJ,—l)~^'Sy — 'S^r, and "Sy is a positive semi-definite matrix, which satisfies the following 
condition: ^ Sy. The existence of matrix K is proved by the following lemma. 
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Lemma 8: There always exists a positive semi-definite matrix K which satisfies 

KSx* = Sx*K = 0, (77) 

where ^x- = il^ - - and = (S"^ +K)'\ 

Since is defined as {'S^ + K) ^ in (1721 ). satisfies 

(Sx'+S^;^)"' = (Ex- +5]iy)~' + K, (78) 

based on Lemma 11, (1) in |[T2l . 

Since K5]j)f. = = 0, multiplying with "Sx* both sides of the equation (178] ). it follows that: 

{■Sx^+'Sy^,)-''Sx' = i'Ex'+'Swr''Sx'+K'Ex' 

= (Sx-+5]h/)"'Sx*, (79) 

and 

= Sx* (Sx* + Sh/)"' . (80) 

Random vectors Yi, ^2, and are defined as Yi = X^, Y2 = + Wq, and ^3 = + Wg + W^, 
respectively, and Xq, Wq, and are independent of each other. Therefore, their covariance matrices 
are represented as 

Sy-j = Sx* + + 'Ew' 

= T^x'+^W- (81) 

From (179] ) and (|8T]) . one can infer that 

= 0, (82) 
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and from (l80l) and (ISTI ). it follows similarly that 



-1 



= 0. 



(83) 



Since the inverse matrix of S^y exists, (Sx* + 5]^^)"^ also exists. 

Therefore, (B) in the equation dTTI ) is zero, and My^+y2|y^(S') = My^|y^(S')My2|yg(S'). It means li 
and are independent given i-e-> -'^g ^'^'^ "'^G + are independent given + Wg + VF^, and 
the Markov chain ( [69l ) is valid. The equality in the equation (|65] | is achieved by the above procedure. 



The versatility of the extremal entropy inequality was already illustrated by means of several appli- 
cations in lilU . However, the original proofs of the extremal entropy inequality in ifTTl were based on 
the channel enhancement technique while one of those applications, the capacity of the vector Gaussian 
broadcast channel, had been already proved by the channel enhancement technique in 1 12]. Even though 
the EEI was adapted to prove the capacity of the vector Gaussian broadcast channel in ifTTl . it failed to 
show a novel perspective since the proof of the EEI relied on the channel enhancement technique |[T2l . 
On the other hand, based on our proof, the extremal inequality shows not only its usefulness but also a 
novel perspective to prove the capacity of the vector Gaussian broadcast channel. 

To illustrate the usefulness of the proposed mathematical framework for proving the EEI, this section 
proposes three additional applications for the mathematical results presented in Section |llll First, an 
alternative much simplified approach for proving the EEI is provided. Second, finding the optimal 
solution of a broadcasting channel with a private message and characterizing the MMSE performance of 
a linear Bayesian estimator for a Gaussian source embedded in additive noise are presented as additional 
applications of the proposed results. 

A. Another Novel Simplified Approach for Establishing the EEI 

Based on the proof presented in previous section, one can come up with another more simplified proof 
for the EEI. This method relies partly on calculus of variations techniques and partly on the results 
established in the previous section. However, this novel framework is very general and can be further 



and the proof is completed. 



IV. Applications of the Proposed Results 
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adapted to proving many other information theoretic inequaUties. In this regard, a companion paper was 
submitted for pubUcation |[28l . The proposed simpUfied proof of the EEI runs as follows. 

First, select a Gaussian random vector Wq whose covariance matrix "E^ satisfies S^y ^ Siy and 
Xl^y ^ Xly. Since the Gaussian random vectors Vg and Wg can be represented as the sum of two 
independent random vectors Wg and Vg, and as the sum of two independent random vectors Wg and 
Wg, respectively, the LHS of the equation (l62l) is expressed as follows: 

hiX + Wg) - fih{X + Vg) 
< h{X + Wg) - fihiX + Vg) + hiWc) - HWg) 

= h{X + WG)-fih{X + WG + VG) + KWG + WG)-HWG). (84) 

Since the equation will be maximized over the last two terms in ( [84l ) are ignored, and by 

defining the new random vectors Y and X as X + Wg + Vg and X + Wg, respectively, the inequality 
in (l62l ) is equivalently expressed as the following variational problem: 

max h{X) - fLhiY) + fi{fi-l) /i(Vb) (85) 
s. t. / / fMU{y - x)dxdy -1 = 0, 



/x(x)/y(y - x)xx dxdy - ^ 0, 
/^(x)/^(y - x)yy^(ixdy - Sy- = 0, 

/^(x)/^(y - x) (yy^ - xx^ _ (y _ x) (y - x)^) dxdy = 0, 
~ / / ^^^^''^v(y - ^) /x(x)dx(iy = p^, (86) 
/y(y) = / /^(x)/^(y-x)(ix, 

where x and y are vectors, X = X + Wg, Y = X + Vg, Wg = Wg + Wg, Vg = Wg + Vg, 
S)^ = Xl + 5]^, Sy. = Sjf* + Sy> and S^. is the covariance matrix of the optimal solution X* . 



Using Euler's equations, we can solve this variational problem, and the problem in (1851 ) is maximized 
when both X and Y are Gaussian random vectors (as shown in Appendix |C]l. The important thing to 
remark here is that solving this variational problem requires only the calculus of variations, i.e., the 
proposed method does not require neither the classical EPI nor the worst additive noise lemma. 
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Next the following inequality is obtained: 

h{X + Wg) - ^lh{X + Wg + Vg) + KWg + Wg) - KWg) 

< h{X^ + WG)- fiHX*a + WG + VG) + h{WG + WG)-h{WG). (87) 
Based on Lemma [8j the RHS of the equation ( [87] ) is expressed as 

hiX*a + Wg) - fihiX*a + Wg + %) + KWg + Wg) - KWg) 
= h{X*a + WG)-fih{X*a + WG + VG), (88) 
and therefore, from the equations in (l84l ). (|87] ). and (|88] ). we obtain the following EEI: 

+ I^g) - f^hiX + Fg) 

< + Wg) - + Vg) + /i(Wg) - ^(V^-g) 

= h{X + IVg) - f^h{X + Wg + Vg) + /i(H^g + Wg) - KWg) 

< KX*G + Wg) - fiKX*G + Wg + Vg) + KWg + Wg) - KWg) 
= KX*G + Wg) - ^iKXh + Wg + Vg) + KWg + Wg) - KWg) 

= Kxh + WG)- ^lKxh + yG), 

and the proof is completed. 

B. Broadcasting Channel with a Private Message 

Consider the practical communication set-up depicted in Figure [H where a broadcasting channel with a 
private message is considered from the perspective of the mean square-error (MSE) performance metric. 
The input-output relationship of this broadcast channel are governed by these equations: 

Yi = X + Zg,, 

Y2 = X + Zg,^ (89) 

where Zg^ and Zg2 are additive Gaussian noise vectors with zero means and covariance matrices 
and S^Q^, respectively. The covariance matrices: S^^^ and S^^^ are assumed to be positive definite. 
Matrix Y,x denotes the covariance matrix of X, and R stands for a positive semi-definite matrix. Random 
vectors X, Zg^, and Zg2 are assumed independent of one another. Random vectors Yi and I2 denote 
the received signals at the receiver 1 and the receiver 2, respectively. 
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Assume that the message X is expected to be decoded only at the receiver 1, but the message X 
can be decoded at both the receivers 1 and 2 if they receive the message X and the MSEs are below 
a certain threshold Tr{R}, respectively. Therefore, the question here is whether or not we can find a 
random vector X which guarantees the MSE at the receiver 1 is below the threshold Tr{R}, while the 
MSE at the receiver 2 is above the threshold Tr{R}, i.e., the receiver 1 can decode the message X, but 
the receiver 2 cannot decode the message X. The notation Tr denotes the trace of a matrix. We are also 
interested in which distribution of X is the most power efficient to maintain such a MSE performance 
at the two receivers. 




Fig. 1. Gaussian broadcasting (wire-tap) channel 



To compare the MSE performance of the two receivers, we assume that both receivers use minimum 
mean-square error (MMSE) estimators. Since the minimum MSE estimator is optimal in the sense that 
it achieves the lowest MSE, this assumption is rational. 

In summary, the goal of this problem is to find the optimal distribution which satisfies the 

following problem: 

min Ex (90) 

/x(x) 

s.t. Tr{Sx|yJ < Tr{R} < Tr{Sx|yJ, 

{X-E [X\Y2]) {X-E [X\Y2]f . 
The solution of the problem in (|90l ) can be obtained by the following procedure; first, define a new 
Gaussian random vector Za,, which satisfies "S^ -< S7„ and "S^ -< S;7„ , where Xl^? stands for 
the covariance matrix of ■ Second, find which satisfies Tr{'Ex\Y2 } = Tr{R}. Then, Tr{Sj^|y^ } < 
Tr{R} = Tt{'Sx\Y2} since "^xiYi — ^^1^2' where Yi = X + Z^, based on the data processing 
inequality for the covariance matrix |6 |. Third, we will prove that there is a Gaussian Xq with a covariance 
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matrix , which satisfies Y,x-^\y^ = '^x\Y2 ^^^^ S^* ^ '^x, where Y2 = Xq + Zq^- Finally, based 
on Lemma m we will show T,x*^\y* = ^x'(.\Y^' where = Xq + and Yj* = Xq + Since 
Sx* is less than or equal to an arbitrary covariance matrix Sx, which satisfies TrjSxiya} = Tr{R-}> 
the Gaussian random vector X^ is the optimal solution (the details of the proof are deferred to Appendix 
El). 

Therefore, by choosing the message X as a Gaussian random vector in ( [89l ). we can securely transmit 
a private message, which is designed to arrive at the receiver 1, in the most power efficient way. 

Remark 3: This scenario can be interpreted as the secure transmission under a vector Gaussian wire-tap 
channel. In this case, the receiver 1 is the legitimate receiver, and the receiver 2 is the eavesdropper. 

C. Bayesian Estimation of a Gaussian Source in Additive Noise Channel 
As shown in Figure |2j the following additive noise channel is considered: 

Y = Xg + Z, (91) 

where Xq is a Gaussian random vector with zero mean and covariance matrix Sx, Z denotes an arbitrary 
random vector (noise) with zero mean and covariance matrix Y,z, and Xq and Z are assumed independent 
of each other. We also assume that the covariance matrix of additive noise Z is upper-bounded, i.e., 
^ R, where R is a given positive semi-definite matrix. 




Z 

Fig. 2. Additive Noise Channel 



Using the channel model in ( [9T] ). we will next analyze the link between the channel input-output mutual 
information and the MMSE performance of a linear Bayesian estimator. First, consider the following 
optimization problem: 

min I{Xg;Y) (92) 

/z(Z) 

s.t. Slz^R- 
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The objective function in (|92l ) can be expressed as 

I{Xg;Y) = I{Xg;Xg + Z) 

= h{XG + Z)-h{Z), 

using the extremal inequality in Theorem [3j it follows that the optimal solution of the optimization 
problem in ( |92l ) is a multi-variate Gaussian density function, and the objective criterion can be expressed 
as 

I{Xg;Xg + Z*c) = h{XG + Z*a) - h{Z*a) 

= — log (93) 

2 |Sy|' 

where Xly is the covariance matrix of y, y = Xg + Zq, and Zq is a Gaussian random vector with the 
covariance matrix R. The right-hand side of (|93] ) can be further expressed in terms of the MSE matrix of 
the linear minimum MSE (LMMSE) estimator under the worst case scenario, i.e., the covariance matrix 
"Ez = R, as follows. Given the channel (|9l1 ). the LMMSE estimator for X takes the form: 

X = E[X] + 'SxCEx + 'R)~'^iY-E[X]) 

= 5]x(Sx + R)"'y, (94) 



where E[-] denotes the expectation, and X stands for the LMMSE estimator of X. The equality in (|94l ) 
is due to zero mean of X. Therefore, its MSE is expressed as 

LMMSE = Ex- ExEy^Ex 

= ExEy^-R. (95) 

Using ( |95] ). the equation in ( |93] ) is expressed as 



IiXG]XG + Z*G) = -^log|LMMSE| + ^log|R|, 



(96) 



and it follows that 



I{Xg;Xg + Z) > I{Xg;Xg + Z*g) 



= -ilog|LMMSE| + ilog|R| . (97) 

Based on the equations in (|97] ). we can conclude the following facts. First, when the additive noise 
is Gaussian, minimizing LMMSE is equivalent to maximizing the mutual information between the input 
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and the output. Second, the worst case mutual information is expressed in terms of the LMMSE, while 
the mutual information in general is lower bounded by a function of the LMMSE. Finally, we observe 
that the LMMSE estimator is, in general, sub-optimal since the mutual information between the input 
and the output is larger than the function of the LMMSE as shown in Wf\ . 

V. Conclusions 

The main contributions of this paper are summarized as follows. In the first part of this paper, an 
alternative proof of the extremal entropy inequality is described in detail. The alternative proof is simpler, 
more direct, more explicit and more information-theoretic than the original proofs. The alternative proof is 
mainly based on the data processing inequality which enables to by-pass the KKT conditions. Moreover, 
using properties of positive semi-definite matrices, one can skip the step of proving the existence of the 
optimal solution which satisfies the KKT conditions, a step which is quite complicated to justify. This 
novel technique is based on a data processing inequality, and it is very unique and creative in respect that 
it presents a novel paradigm for lots of applications such as the capacity of the vector Gaussian broadcast 
channel and the secrecy capacity of the Gaussian wire-tap channel, which were proved commonly based 
on the channel enhancement technique ifTTl . |[T2l . |[T4l . and ifTSl . Additional relevant applications in 
this regard include ITSl - |[24l . In the second part of this paper, several additional important applications 
for the extremal entropy inequality are presented. In this regard, a second and even more simplified 
approach for establishing the extremal entropy inequality without using EPI or the worst data processing 
lemma is presented by exploiting the mathematical tools developed in the first part of this paper. Two 
additional applications of the proposed mathematical results are presented for the problem of determining 
the optimal solution for the broadcasting channel problem with a private message and in establishing a 
mutual information-based performance bound for the mean square-error of a linear Bayesian estimator 
for a Gaussian source in an additive noise channel. One can observe that the last application presented 
can be sightly extended to non-Gaussian sources, a fact that suggests that the extremal entropy inequality 
([T]) might hold true even for non-Gaussian Wg- However, establishing an extension of the EEI in this 
direction or other directions such as proving the EEI under a more general constraint (such as an upper 
and/or lower-bound constraint on the power spectral density of the random vector X instead of the 
covariance matrix constraint) represent interesting open problems. Finally, we would like to thank the 
reviewers for their constructive comments and bringing to our attention the reference 1291 which presents 
a completely different approach for proving EEI. This approach relies on showing the optimality of 
Gaussian distribution by exploiting the factorization of concave envelopes. At the time of submitting our 



November 21, 2012 



DRAFT 



24 



paper in August 2011, we were not aware of the parallel submission ||29l . 

Appendix A 
Proof of Lemma [7] 

Proving Sx* ^ is equivalent to proving the following: 

Sx. ^ (98) 

^ ^^^if,-l)'Sx (99) 

;5]x + Siy)"VLr'-Sx ^ (/u-l)5]x (100) 



^ (Sx + Sh/)-Vl^A^-1s^^ (101) 

Since there always exists a non-singular matrix which simultaneously diagonalizes two positive semi- 
definite matrices ||26l . there exists a non-singular matrix Q which simultaneously diagonalizes both "Sx 
and SvF as follows: 

Q^^xQ = I, (102) 
Q^I^wQ = Diy, (103) 

(104) 

where I is an identity matrix, and D^y is a diagonal matrix. Since Q is a non-singular matrix, the inverse 
of Q always exists, and and SvK are expressed as 

= Q^Q \ (105) 
= Q"^DiyQ"^ (106) 

If we define as a diagonal matrix whose i*'^ diagonal element is represented as d^., and which it is 
defined as 

if dw, < — 1 



where dw^ denotes the i^^ diagonal element of Dw, and define L as 



^ f ^^ (107) 
/' , N if dvy, > - 1 



QDlQ^, (108) 
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the equation (IIOII) is equivalent to 

(5]x + STy)-VL>i^-i5]^i (109) 

^ {Q^Q^ + Q^T>wQ Y^ + QT>lQ^ hf^~^QQ^ (110) 
^ (I + Dh.)-' + ^ /.-il. (Ill) 

The equation (|llll) always holds since D/, is defined as in (11071) and (11081 ) to satisfy (|llll) . Therefore, 
the inequality ( |98] ) is also satisfied. 

We know that is Sx — Since Sx* = l)^^Syj/, Sx' is expressed as I^x — (/^ — 1)"^^]^, 
and 

Ex'L = (Sx - - 1)"' L, (112) 
and the equation (11121 ) is re-written as 
Sx'L=(l]x-(^-l)"'Svi/)L (113) 

= |q"^Q"' - (/^ - 1)"' (^((Q-^Q-^ + Q-^DvyQ-^)"' + QDlQ^)"' - Q-^Q-^) | 

xQDlQ^ (114) 
= (/i - ly' Q-^ (^^I - ((I + Diy)-^ + Dl) j DlQ^ (115) 
= 0. (116) 

The equaUty (11141) is due to the equations (11051 ). (11061 ). and (11081 ). and the equality (1116b is due to (I107I) . 
Similarly, 

LSx'=l(5]x-(;U-1)-1s^ 



QDlQ^ 



= 0. 

Therefore, by defining = ((Sx + Xl^y)"^ + L)~^ — Ex, we can make 5]^ satisfy 

5:,;^. ^ (m-1)5]x, 5]x'L = LSx' = 0, (117) 

and the proof is completed. 
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Remark 4: Since the optimization problem in ifTTI is generally nonconvex, the existence of optimal 
solution must be proved IfTTI . |[T2l . and this step is very complicated. However, in our proof, Lemmas 
|7] and [8] serve as a substitute for this step since we by-pass KKT-condition related parts using the data 
processing inequality. This makes the proposed proof much simpler. 

Appendix B 
Proof of Lemma [8] 

Proving < {fj. — 1)^^X1^ is equivalent to proving the following: 

^^^ifi-iy'-Sy (118) 

^ S^i + K^(^-l)5]^i. (119) 

Since there always exists a non-singular matrix which simultaneously diagonalizes two positive semi- 
definite matrices ll26l . there exists a non-singular matrix Q which simultaneously diagonalizes both Sty 
and as follows: 

Q^SvkQ = Dm/, (120) 

Q^S^Q = I, (121) 

where I is an identity matrix, and Dvi^ is a diagonal matrix. Since Q is a non-singular matrix, the inverse 
of Q always exists, and Tiw and are expressed as 

= Q^BwQ'\ (122) 

= Q-'^Q-^ (123) 

If we define Y)k as a diagonal matrix whose i*^ diagonal element is represented as , and which it 
is defined as 

Jo ifdiy. <(m-i)-' 

dK, = { , , (124) 

where d\Y^ denotes the i^^ diagonal element of D^k, and define K as 

K = QBkQ^, (125) 

then the equation ( |119l l is equivalent to 

^^' + Kh{f^- 1) (126) 

^ (Q-^DH/Q-^)"' + QD;^Q^>r (/i-l)(Q-^Q-i)"' (127) 

^ D^i + Di^ >r - 1) L (128) 
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The equation (1128b always holds since Dk is defined in (I124I ). Therefore, the inequality (11181 ) is also 
satisfied. 

We know that Tix- is in - - S^. Therefore, 

Sx.K = (/i-l)-^(S^-(/x-l)5]^;j,)K, (129) 

and the equation (|129l l is re-written as 

Ex.K = (/.-l)-^(S^-(^-l)I]^)K (130) 

= 1)-^ (q-^Q-^ - - 1) ((Q-^Dh'Q"^)"' + QDi^Q^)"'^ QDi^Q^(131) 

= (^-l)-iQ-^(l-(^-l)(D-i+Di^)-')Di^Q^ (132) 

= 0. (133) 

The equality (11311 ) is due to the equations (11221 ). (11231 ). and (11251 ). and the equaUty ( 11331 ) is due to (I124I ). 
Similarly, 

KEx. = (/u-l)"'K(S^-(/.-l)5]^) 

= - 1)-^ QDi^Q^ (q^^'Q"' - - 1) ((Q-^D^Q-') + QDa-Q^) 
= - l)-i QD^ (l - (/^ - 1) (D^:^.! + D^)"') 
= 0. 

Therefore, by defining = (S^^ + K)^^, we can make satisfy 

^ 5]x*K = KSx. =0, (134) 

and the proof is completed. 

Remark 5: In Lemmas [T] and [H we specify the structure of positive semi-definite matrices L and K, 
and this yields additional details about the structure of the covariance matrix of the optimal solution. 
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Appendix C 
A More Simplified Proof of the EEI 

The problem in dSSl ) is more appropriately re-formulated as follows: 



max / / /x(x)/,>(y-x)(/ilog/y(y)-log/^(x)-^(^-l)log/,>(y-x))(ix(iy (135) 

JxJy J J 

s.t. / //^(x)/^(y-x)dxdy = 1, 

' yiyj - XiXj -{y- x)^ {y - x)A fj^{yL)fy{y - x)dxdy = 0, 



n n / „ „ \ n n 

1=1 j = l ^ ^ i=l j = l 

j j viVj fx (^) h (y ~ ^) '^^'^y = '^h ' 

~ J J fx(^)fv(y-^) log/^(x)(ixdy = p^, 

/y(y) = yy"/x(x)/v^(y-x)dxdy, (136) 
where the arbitrary deterministic non-zero vector ^ is defined as [^i, • • • , Cn]"^> c^y* denotes the row 

ij 

and column element of Sl^,., i = 1, . . . ,n, and j = 1, . . . , n. 

Using Lagrange multipliers, the functional problem and its constraints in (11351 ) are expressed as 

max j (^j K{x,y,fj^,fY)dxj +K{y,fY)dy, 

(137) 

where 

K{yi,y, fx Jy) = /^(x)/^>(y - x) (^/ilog /y(y) -log/^(x) - l)log/^(y-x) + oq 

n n 

+ ^ ^ {lijViyj - lijXiXj - jij {y - x)j {y - x)j + BxiXjCiCj + (pijyiyj 

1=1 j=l 



-ailog/^(x) - A(y) 

K{y,fY) = A(y)/y(y), (138) 

where oq, ai, 7ij, 0, (/'ij, and A(y) stand for the Lagrange multipliers. 
The first-order variation condition is checked as follows: 



=0 (139) 

fx=fx'- jY=fY- 

= 0, (140) 

fy—fx' Jv—fv 
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Kj_ and K'j-^ are the first-order partial derivatives witli respect to and fy, respectively! 

Since the equalities in (11391) and (11401 ) must be satisfied for any x and y, one can easily obtain the 
following Gaussian density functions and fy as solutions: 



(27r)-§ iSy.r^ exp |-^y^S^!y| 



(141) 



2 -Y- 

Since all the Lagrange multipliers exist in this problem, the necessary optimal solutions and fy 
exist even though the original problem is non-convex in general. 

To make the second variation positive, the negative-definiteness of the following matrix is required: 



K" 

JY* Jx* 



K" 

fx* Iy' 

jY* jY- 



(142) 



where -fCj. and K'^^^j^^ stand for the second-order partial derivatives with respect to f-^, and fy, 
respectively, and K'j_ denotes the second-order partial derivative with respect to and fy - Thus, 
the following condition is required to hold: 



hj^ hy 



K" K" 

fx* fx* fx*fY 
^fY*fx* ^fY*fY 



h 



X 
hy 



= ^.*fA^^Y*fY*^^^LfY* +Kl,j^,)hyh^ 

< 0, 

where h-^ and hy are arbitrary admissible functions. 

Since K'i r , K'l r , K'l r , and K'l r are defined as 

fx* fx* fx*fY*' jY*Jx* JY*JY* 

(l-ai)/y(y-x) 



(143) 



K" 

fx* fx* 



K'L f 

Jx* JY 



K'l , f 

jY*Jx 



K'l f 

jY* jY 



/x*(x) 



fj'fyjy - x) 

f^fx*{^)fv{y -x) 
fY*{yY 



(144) 



'Throughout the paper, the arguments of functionals or functions are omitted unless the arguments are ambiguous or confusing. 
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the equation in (11431 ) requires 

(l-«l)/y(y-x) 2^ /i/^(y-x) , . . ^/x>(x)/y(y-x) 3 



' X 

< 0, (145) 



where ai > 1 — /i. 

Therefore, the optimal solutions f-^, and /y. maximize the functional problem in ( [85] ). and the proof 
is completed. 

Appendix D 

Details of an application for broadcasting channel with a private message 



Using Lemma [H we can define a covariance matrix 5]^ which satisfies S^j < "Szr, and 

° — ^Gi ^Gi — '-'1 ^Gi 



Sz„ as follows: 

S^,^ = (Szg,+K)-\ 

where K is a positive semi-definite matrix, defined similarly to the one in Lemma [8] 
Since 

S^ly, = i:Zo,-i:ZaJ{X + ZG2)^Za2, (146) 

where J (X + ZG2) denotes the Fisher information matrix of the random vector X + ZG2 [6 1, by changing 
the covariance matrix of X, we can always find X, whose posterior covariance matrix 'Sx\Y2 satisfies 
Tr{I]^|yJ = Tr{R}. 

Then the random vector X satisfies the following relationship: 

Tr{S^I^J < Tr{S^|y} = Tr{R}, (147) 

where Yi = X + Zq^- The first inequality is due to the data processing inequality [SI. 

Using Cramer-Rao inequality |6|, we can choose a Gaussian random vector Xq, whose covariance 
matrix Sx* satisfies the following: 

5]x£ + Szg2 = i{X + ZG2r^ (148) 
< 3{XG + ZG2r^ 

= ^Xg + ^Zg2 ! 
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and 

^X'^^'SXa- (149) 

Therefore, for any random vector X, whose covariance matrix Sx satisfies TrjSxiya} = Tr{R}, we 
can find a Gaussian random vector X^, whose covariance matrix satisfies the relationship in (1149b . Also, 
due to the equations in (11461 ) and (1148b . 

where = X^ + Zg,. 

Now, based on Lemma [H we will show S-^. = Hx^iyi follows. Since = Xq + = 
Xq + + ^Gi > we can construct a Markov chain as 

X*a^x:^ + Zg, ^X*g + Zg, + Zg, , (150) 

where Zg, is a Gaussian random vector with the covariance matrix 5]^ , and it satisfies Y,Zr^. = 

^1 Gi 

^Gi 

The Markov chain in (11501 ) is the same as the one in (|68] ). and therefore, based on Lemma [H we can 
obtain the Markov chain: 

Xq > Xq + Zg^ + Zg^ Xq + Zq^ , 

and this Markov chain is the same as the one in (|69l ). In this case, = (^^Za ' ^^'^ ^^g 

defined as a5]^ —'^'7 , where and Zgo are Gaussian random vectors with covariance matrices 

^Gi ^ ^ 

^Zg2 ^"'^ ^Zg,' respectively, Zg^ = Zg, + Zg^ + Zg^, '^Zo^ =^Zg,^ '^Zg^ ^^Zg^' random 
vectors are independent of one another. The positive semi-definite matrix K is defined as the one in 
Lemma [8] The constant a must be chosen to satisfy the equation in (11481 ). By defining the matrix 

^G2 

as follows: 

where matrix L is similarly defined as the one in Lemma |7J the existence of such Xq is guaranteed. 
Therefore, by choosing a Gaussian random vector Xq as mentioned previously, 

Tr{Sx£|y;}^Tr{Sx.|K^.}=Tr{R}, 

and the covariance matrix is the minimum value with respect to the positive semi-definite partial 
ordering, and the proof is completed. 
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